Skip to content

wip – opened by mistake#2689

Closed
sdcoffey wants to merge 11 commits into
mainfrom
dev/steve/app-server-python-types
Closed

wip – opened by mistake#2689
sdcoffey wants to merge 11 commits into
mainfrom
dev/steve/app-server-python-types

Conversation

@sdcoffey
Copy link
Copy Markdown
Collaborator

@sdcoffey sdcoffey commented Mar 17, 2026

No description provided.

sdcoffey and others added 11 commits March 16, 2026 13:04
…#6)

This pull request adds a native sandbox runtime to
`openai-agents-python` by porting the relevant `universal_computer`
sandbox pieces into `src/agents/sandbox` and reshaping them around the
existing Agents SDK model instead of preserving the old UC agent stack.

The main change is a new `SandboxAgent` + `SandboxAgentRunner` flow that
fits the normal `Agent`/`Runner` paradigm while still supporting
manifests, capabilities, session state, snapshots, and sandbox-backed
tools. Along the way, this PR ports the sandbox sessions, manifests,
entries, utilities, and Docker/Modal/E2B/Unix backends, removes the
stale `universal_computer` import paths, and makes the sandbox code
Python 3.10 compatible.

This pull request also cleans up the port rather than carrying forward
legacy UC shapes wholesale. In particular, it drops the old
PTY/session-terminal surface and the legacy context-manager-oriented DX,
uses concrete sandbox types instead of a protocol-heavy type layer, and
keeps Docker as an optional extra rather than a core dependency. To make
the new flow easier to try, it adds a Docker example that streams agent
text and tool activity as it runs, plus focused sandbox tests covering
manifests, snapshots, session behavior, and the runner integration.

---------

---------

Co-authored-by: Kazuhiro Sera <seratch@openai.com>
This pull request fixes a shared sandbox persistence contract bug across
Docker, E2B, Modal, and Unix-local.

Before this change, Docker had branch-local logic to account for actual
in-workspace mount targets, but the other backends still reasoned mostly
from the logical manifest key. A manifest such as `entries={"logical":
Mount(..., mount_path=Path("actual"))}` could therefore exclude
`logical` while still persisting or snapshotting `actual` in Unix-local,
Modal, and parts of E2B. E2B also enumerated mounts only from top-level
manifest entries, so nested mounts under `Dir(...)` were never unmounted
before snapshotting.

The fix moves that contract into shared manifest and mount helpers. The
manifest now computes effective ephemeral persistence paths from both
logical entry paths and resolved in-workspace mount targets, and it
exposes deepest-first ephemeral mount targets for backends that need
temporary unmount/remount around persistence. Docker is refactored onto
those shared helpers, Unix-local and Modal now exclude relocated mount
targets consistently, and E2B now handles nested mounts plus
rollback-safe remount behavior when persistence setup fails partway
through.

The regression coverage locks the contract at both the shared-helper and
backend levels. The new tests cover relocated `mount_path` exclusion,
nested mount traversal and ordering, E2B unmount/remount rollback, and
Modal persistence modes so these backends do not drift apart again.
This pull request adds a dedicated Codex sandbox artifact and updates
sandbox write plumbing so the GitHub release archive can be streamed
directly into the target box before being extracted in place.

- add target-aware Codex GitHub asset resolution for Linux and macOS,
with Windows explicitly unsupported for now
- auto-add a dummy-version Codex entry from the sandbox manifest at
`codex_relpath` unless an explicit entry already exists there
- stream write payloads through unix-local and Docker sandboxes so
artifact downloads no longer need SDK-side buffering
- add focused sandbox manifest, session, entry, and extract coverage for
the new behavior
This pull request fixes a sandbox ordering bug that broke the contract
of `InputGuardrail(run_in_parallel=False)` on the first turn.

Before this change, both `Runner.run()` and `Runner.run_streamed()`
called `SandboxRuntime.prepare_agent()` before running first-turn
sequential input guardrails. In sandbox-backed runs, that meant a
guardrail trip could still happen after sandbox side effects had already
occurred. Concretely, a blocked run could still create and start a
runner-owned sandbox session, start a stopped injected session, or apply
capability-driven manifest deltas to an already-running injected
session.

That behavior is wrong because sequential input guardrails are supposed
to run before the agent starts. For sandbox agents, "agent start" was
effectively happening too early through sandbox preparation and session
materialization.

This change moves first-turn sequential input guardrails ahead of
sandbox preparation for sandbox-enabled runs in both the non-streaming
and streaming execution paths. Parallel input guardrails still run
alongside the model work as before, so the fix is narrowly scoped to the
blocking-before-start path. The streamed guardrail helper was also
updated so early guardrail execution still records results correctly
even before an agent span exists.

The new regression tests cover all affected combinations:
- non-streamed runner-owned sandbox sessions,
- non-streamed running injected sessions,
- streamed runner-owned sandbox sessions,
- streamed running injected sessions.

Each test verifies that when the guardrail trips, sandbox preparation
produces no side effects: no session creation, no session start, and no
live-session manifest materialization.
#18)

This pull request fixes two correctness bugs in `WorkspaceJsonlSink` by
preserving pre-existing outbox history on the first flush and by
excluding ephemeral sink outputs through runtime-only persistence paths
instead of mutating the manifest. It updates the shared sandbox session
layer so Unix-local, Docker, Modal, and E2B persistence all honor the
same skip-path set, which keeps durable siblings under existing
directories while still pruning the sink outbox. It also adds
regressions covering pre-populated outboxes, repeated flushes,
existing-parent persistence, Docker staged-copy pruning, and Modal/E2B
tar exclusion wiring.
This pull request fixes Modal snapshot-filesystem persistence so cleanup
failures while stripping ephemeral paths fail closed instead of silently
taking a snapshot. It checks the pre-snapshot `rm -rf` result before
calling `snapshot_filesystem()`, restores the backed-up ephemeral
payload when cleanup fails, and raises a structured archive-read error
with exit-code context. The change also adds regression coverage for the
non-zero cleanup path to ensure snapshotting is skipped and restore
still runs.
This pull request fixes a behavioral mismatch where
`SandboxRunConfig(session=...)` skipped `Capability.process_manifest()`
while `session_state` resume flows and fresh `client.create(...)`
sessions did not.

In practice, that meant capability-provided manifest changes only worked
in some sandbox startup modes. A capability that adds workspace
scaffolding such as `README.md`, `cap.txt`, helper files, or other
manifest-backed setup would behave correctly when the runtime created or
resumed a session, but silently fail when a caller injected an
already-created live session. The same agent and capability could
therefore produce different instructions, different visible workspace
files, and different tool preconditions depending only on whether the
run used `session=...` or `session_state=...`.

This was especially risky for long-lived injected sessions. Capability
tools still bound to the session, but any files or manifest-backed
instructions those tools expected were never added to
`session.state.manifest`, and for already-running sessions they were
never materialized into the workspace either. That creates a
hard-to-diagnose failure mode where capability-dependent runs break only
in the live-session configuration.

This change makes injected live sessions follow the same
manifest-processing path as the other sandbox acquisition modes. The
runtime now applies capability manifest mutations to the injected
session state, reapplies the manifest once for already-running injected
sessions so capability-owned files exist without restarting the session,
and preserves the existing caller-owned lifecycle semantics for injected
sessions.
This pull request fixes sandbox ZIP extraction compatibility for
helper-call paths that receive non-seekable archive streams.

The change replaces the old `seekable()`-presence heuristic with an
actual random-access probe in
`src/agents/sandbox/session/archive_extraction.py`. Streams that already
support `tell()` and `seek()` are passed through unchanged, while
non-seekable streams are copied into a rewindable `SpooledTemporaryFile`
before `zipfile.ZipFile(...)` reads them. This keeps the normal
`BaseSandboxSession.extract()` behavior unchanged while making the
lower-level ZIP helper correct for future direct callers.

The pull request also removes the now-unused private
`_zipfile_compatible_stream()` seam from
`src/agents/sandbox/session/base_sandbox_session.py` and updates
`tests/test_sandbox_extract.py` to cover both valid random-access
streams without a `seekable()` method and streams whose `seekable()`
method explicitly returns `False`.
This pull request adds a sandbox `Skills` capability and ports the
tax-prep packaged-agent demo onto the sandbox runtime used in this
repository.

This branch includes four follow-up commits:
- `f25d142f` fixes Modal archive writes and hardens `ls` path parsing.
It also renames the Docker runner example to
`examples/sandbox/basic.py`, updates the extensions README, and adds
regression coverage.
- `62a37eda` makes sandbox Codex installs ephemeral by default so Codex
artifacts are treated as runtime-only state.
- `3cb9e996` adds the new skills capability, focused tests, a
sandbox-backed apply-patch helper for examples, and a Docker-based
`examples/sandbox/tax_prep.py` demo with sample tax PDFs.
- `88e45e3e` thins the sandbox skills capability so it only mounts
skills into a Codex auto-discovery root (defaulting to `.agents/skills`)
and no longer injects prompt-side skill indexes or custom skill
instructions.

Notes:
- I intentionally left unrelated local files uncommitted, including
`2026-03-16__victoria_zheng/`.
- The full local verification stack now passes: `make format`, `make
lint`, `make typecheck`, and `make tests`.
@sdcoffey sdcoffey closed this Mar 17, 2026
@sdcoffey sdcoffey deleted the dev/steve/app-server-python-types branch March 17, 2026 00:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b7b7b60fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@openai openai deleted a comment from chatgpt-codex-connector Bot Mar 17, 2026
@sdcoffey sdcoffey changed the title feat: add remote sandbox app-server websocket client wip – opened by mistake Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants